CoMRI: A Compressed Multi-Resolution Index Structure for Sequence Similarity Queries
نویسندگان
چکیده
In this paper, we present CoMRI, Compressed MultiResolution Index, our system for fast sequence similarity search in DNA sequence databases. We employ Virtual Bounding Rectangle (VBR) concept to build a compressed, grid style index structure. An advantage of grid format over trees is subsequence location information is given by the order of corresponding VBR in the VBR list. Taking advantage of VBRs, our index structure fits into a reasonable size of memory easily. Together with a new optimized multiresolution search algorithm, the query speed is improved significantly. Extensive performance evaluations on Human Chromosome sequence data show that VBRs save 80%-93% index storage size compared to MBRs (Minimum ounding Rectangles) and new search algorithm prunes almost all unnecessary VBRs which guarantees efficient disk I/O and CPU cost. According to the results of our experiments, the performance of CoMRI is at least 100 times faster than MRS which is another grid index structure introduced very re-
منابع مشابه
Scalable Sequence Similarity Search and Join in Main Memory on Multi-cores
Similarity-based queries play an important role in many large scale applications. In bioinformatics, DNA sequencing produces huge collections of strings, that need to be compared and merged. One strategy to speed up similarity-based queries is parallelization on clusters using MapReduce. However, distributing data over a cluster also incurs high cost. At the same time, modern hardware offers pa...
متن کاملEfficient Multi-Query Evaluation over Compressed XML Data in a Distributed Environment
With increasing dissemination of XML, multi-query processing has become a practical and meaningful issue to resolve. However, the verbosity of XML data causes inefficient query processing and high network bandwidth consumption. This paper addresses the problem of evaluating a heavy load of subscribed queries as a whole (or simply multiqueries) over compressed XML data in a distributed environme...
متن کاملIndexing Spatially Sensitive Distance Measures Using Multi-resolution Lower Bounds
Comparison of images requires a distance metric that is sensitive to the spatial location of objects and features. Such sensitive distance measures can, however, be computationally infeasible due to the high dimensionality of feature spaces coupled with the need to model the spatial structure of the images. We present a novel multi-resolution approach to indexing spatially sensitive distance me...
متن کاملImage Retrieval Using Dynamic Weighting of Compressed High Level Features Framework with LER Matrix
In this article, a fabulous method for database retrieval is proposed. The multi-resolution modified wavelet transform for each of image is computed and the standard deviation and average are utilized as the textural features. Then, the proposed modified bit-based color histogram and edge detectors were utilized to define the high level features. A feedback-based dynamic weighting of shap...
متن کاملLRM-Trees: Compressed Indices, Adaptive Sorting, and Compressed Permutations
LRM-Trees are an elegant way to partition a sequence of values into sorted consecutive blocks, and to express the relative position of the first element of each block within a previous block. They were used to encode ordinal trees and to index integer arrays in order to support range minimum queries on them. We describe how they yield many other convenient results in a variety of areas: compres...
متن کامل